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ABSTRACT 

As  FPGA  densities  increase,  partitioning-based  FPGA  placement 
approaches  are  becoming  increasingly  important  as  they  can  be 
used  to  provide  high-quality  and  computationally  scalable  solu¬ 
tions.  Flowever,  modern  FPGA  architectures  incorporate  heteroge¬ 
neous  resources,  which  place  additional  requirements  on  the  parti¬ 
tioning  algorithms  because  they  now  need  to  not  only  minimize  the 
cut  and  balance  the  partitions,  but  also  they  must  ensure  that  none 
of  the  resources  in  each  partition  is  oversubscribed.  In  this  paper, 
we  present  a  number  of  multilevel  multi-resource  partitioning  al¬ 
gorithms  that  are  guaranteed  to  produce  solutions  that  balance  the 
utilization  of  the  different  resources  across  the  partitions.  We  evalu¬ 
ate  our  algorithms  on  twelve  industrial  benchmarks  ranging  in  size 
from  5,236  to  140,1 18  vertices  and  show  that  they  achieve  minimal 
degradation  in  the  min-cut  while  balancing  the  various  resources. 
Comparing  the  quality  of  the  solution  produced  by  some  of  our 
algorithms  against  that  produced  by  hMEliS,  we  show  that  our  al¬ 
gorithms  are  capable  of  balancing  the  different  resources  while  in¬ 
curring  only  a  3.3%-5.7%  higher  cut. 
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1.  INTRODUCTION 

In  recent  years,  due  to  the  development  of  high-quality  multi¬ 
level  hypergraph  partitioning  algorithms  [9,  2],  partitioning-based 
placement  has  emerged  as  a  promising  approach  for  placing  large 
designs  on  ASICs.  These  methods  have  been  shown  to  be  com¬ 
putationally  scalable,  capable  of  leading  to  high-quality  solutions, 
and  scale  to  very  large  designs  [13,  1],  Moreover,  as  FPGA  den¬ 
sities  increase,  the  characteristics  of  this  placement  methodology 
are  becoming  increasingly  important  for  placing  large  designs  on 
FPGAs,  as  well  [12], 

However,  unlike  ASICs  that  are  in  general  homogeneous,  and  as 
such,  the  only  constraint  that  they  impose  on  the  partitioning  al¬ 
gorithm  is  that  of  balancing  the  area  of  the  cells  assigned  to  the 
different  partitions,  modem  FPGA  architectures  incorporate  het¬ 
erogeneous  resources  (e.g.,  CLBs,  Multipliers,  RAM  blocks,  IP 
Cores  [16],  etc).  This  places  additional  constraints  on  the  type 
of  partitionings  that  need  to  be  computed,  as  the  partitioning  al¬ 
gorithm  must  now  ensure  that  the  resources  used  in  each  partition 
can  be  accommodated  by  the  resources  provided  at  the  different  re¬ 
gions  of  the  FPGA.  For  example,  a  partitioning  solution  that  places 
most  of  the  FFs  on  one  side  of  the  bisection  and  most  of  the  RAM 
blocks  on  the  other  side  of  the  bisection,  even  if  it  is  balanced  in 
terms  of  the  total  number  of  cells  on  either  side  of  the  cut,  it  is  not 
very  useful  for  FPGA  placement  as  it  may  over-subscribe  these  two 
resource  types. 

As  a  result,  existing  partitioning  algorithms  [9,  2,  4,  14,  7]  can 
not  be  used  to  develop  partitioning-based  placement  methods  for 
FPGAs  with  heterogeneous  resources,  as  they  can  lead  to  partition¬ 
ings  that  have  highly  unbalanced  resource  requirements.  To  illus¬ 
trate  this,  we  used  a  multilevel  hypergraph  partitioning  algorithm 
(hMEliS  [10])  to  bisect  twelve  different  circuits  synthesized  for  the 
Xilinx  Vertex  II  architecture,  which  contain  cells  that  map  to  differ¬ 
ent  resources.  Various  statistics  measuring  the  balance  of  the  dif¬ 
ferent  resource  types  are  shown  in  Table  1.  These  results  show  that 
even  though  the  bisection,  in  terms  of  the  number  of  cells  assigned 
to  each  partition,  achieves  a  balance  of  49%-51%,  in  general,  indi¬ 
vidual  resources  are  considerably  more  unbalanced. 

In  this  paper,  we  present  a  new  class  of  multi-resource  hyper¬ 
graph  bisectioning  algorithms  that  are  capable  of  producing  a  parti- 
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Table  1:  The  distribution  of  unbalance  factors  of  different  types 
of  cells,  for  49%-51%  bisection.  For  the  partition  to  be  feasible, 
unbalance  factor  of  each  cell-type  must  be  below  2.0.  The  col¬ 
umn  “min  ub”  shows  the  minimum  unbalance  factor,  “max  ub” 
shows  the  maximum  unbalance  factor,  “avg  ub”  shows  average 
unbalance  factor,  and  “#  viol”  shows  the  number  of  cell-types 
in  violation  by  exceeding  the  unbalance  factor  of  2.0. 

tioning  solution  that  simultaneously  balance  the  different  resources 
assigned  to  each  one  of  the  partitions,  and  thus  can  be  used  to  power 
partitioning-based  placement  methodologies  for  emerging  FPGA 
architectures.  Specifically,  we  present  five  different  multi-resource 
partitioning  algorithms  that  are  based  on  the  multilevel  hypergraph 
partitioning  paradigm.  Three  of  these  algorithms  solve  the  prob¬ 
lem  by  balancing  the  different  resources  at  the  same  time  that  they 
compute  the  bisection,  while  the  other  two  are  used  to  post-process 
a  high-quality  but  potentially  unbalanced  solution  to  enforce  the 
multiple  balancing  constraints.  We  experimentally  evaluated  the 
performance  of  these  algorithms  on  twelve  different  industrial  cir¬ 
cuits  containing  up  to  140,1 18  cells.  Our  results  show  that  each  one 
of  these  algorithms  is  capable  of  producing  solutions  that  satisfy  the 
multiple  balancing  constraints  and  achieve  different  time-quality 
trade-offs.  Moreover,  comparing  the  quality  of  the  solution  pro¬ 
duced  by  some  of  our  algorithms  against  that  produced  by  hMEliS, 
we  show  that  our  algorithms  are  capable  of  balancing  the  different 
resources  while  incurring  only  a  3.3%-5.7%  higher  cut. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  defines 
various  concepts  and  terms  that  are  used  in  the  paper  and  present 
a  brief  overview  of  the  multilevel  partitioning  paradigm.  Section  3 
provides  a  formal  definition  of  the  multi-resource  partitioning  prob¬ 
lem.  Section  4  describes  the  various  multi-resource  partitioning 
algorithms  that  we  developed.  Section  5  present  a  comprehensive 
experimental  evaluation  of  these  algorithms.  Finally,  Section  6  pro¬ 
vides  some  concluding  remarks. 

2.  NOTATION  AND  BACKGROUND 

A  hypergraph  G  =  (V,  E)  is  a  set  of  vertices  V  and  a  set  of 
hyperedges  E.  Each  hyperedge  is  a  subset  of  the  set  of  vertices  V. 
The  size  of  a  hyperedge  is  the  cardinality  of  this  subset.  A  vertex 
v  is  said  to  be  incident  on  a  hyperedge  e,  if  v  e  e.  Each  vertex 
v  and  hyperedge  e  has  a  weight  associated  with  them  and  they  are 
denoted  by  w(v)  and  w(e ),  respectively.  A  circuit/netlist  consist¬ 
ing  of  a  set  of  cells  and  a  set  of  nets  can  be  directly  represented  via 
a  hypergraph,  whose  vertices  corresponds  to  the  cells  and  whose 
hyperedges  corresponds  to  the  nets.  Due  to  this  one-to-one  corre¬ 
spondence  between  hypergraphs  and  netlists  we  will  use  the  terms 
vertices/cells  and  hyperedges/nets  interchangeably  throughout  this 
paper. 

A  bisection  of  V  is  denoted  by  a  vector  P  such  that  P\i\  indi¬ 


cates  the  partition  number  that  vertex  i  belongs  to.  The  cut  of  the 
bisection  is  equal  to  the  sum  of  the  weight  of  the  hyperedges  that 
connect  vertices  belonging  to  different  partitions.  We  say  that  a 
bisection  P  of  V  satisfies  a  single  balancing  constraint  specified 
by  [/,  u\,  where  /  <  u,  iff  /  <  w(v)  S  u,  for  each  parti¬ 

tion  Vi .  A  bisection  that  satisfies  the  constraint  is  called  feasible, 
otherwise  it  is  infeasible.  Given  these  definitions,  the  hypergraph 
bisection  problem  is  formally  defined  as  follows:  Given  a  hyper¬ 
graph  G(=  V,  E)  and  a  balancing  constraint  [I ,  u ],  find  a  feasible 
bisection  P  of  G  that  minimizes  the  cut.  Since  there  is  only  a  single 
balancing  requirement,  this  formulation  is  usually  referred  to  as  the 
single-constraint  bisectioning  problem  [5], 

3.  PROBLEM  DEFINITION 

Flistorically,  FPGA  devices  contained  single  type  of  resource 
(CLBs  for  example)  that  were  uniformly  distributed  throughout  the 
chip.  Flowever,  taking  advantage  of  ever-increasing  silicon  den¬ 
sities,  modern  FPGA  devices  contain  multiple  types  of  resources, 
which  allow  them  to  efficiently  implement  complex  and  high  per¬ 
formance  designs.  One  such  example  is  the  recently  introduced 
Virtex  II  architecture  from  Xilinx  that  contains  specialized  resources 
such  as  multiplier  and  RAM  blocks  interspersed  among  CLBs.  As 
a  result,  designs  created  for  such  modern  FPGAs  try  to  pro  actively 
make  use  of  these  specialized  resources  in  order  to  obtain  better 
performance  and  versatility. 

For  partitioning  driven  placement  to  succeed  in  utilizing  these 
different  resource  types,  the  partitioning  algorithms  need  to  take 
them  into  account  and  balance  each  type  of  cells  across  the  cut 
lines.  Motivated  by  this  observation  we  focus  on  multi-resource 
aware  partitioning,  which  can  be  formally  defined  as  follows.  Con¬ 
sider  an  FPGA  architecture  with  m  distinct  resource  types  and  let 
clfJ  denote  the  minimum  number  of  resources  of  type  i  allowed  in 
partition  j,  and  cuj  J  be  the  maximum  number  of  resources  of  type 
i  allowed  in  partition  j .  Then  the  multi-resource  bisection  P  of  G 
seeks  to  minimize  the  cut  subject  to: 

chj  <  1  -  cu‘  ' 

V«eV:P[u]  =  l  and  t(v)=i 

for  j  =  1.2,/  =  1,2 , ...  ,m,  and  t  (v )  is  the  resource  type  required 
by  cell  v.  Note  that  this  is  a  general  definition  of  the  multi-resource 
bisection  and  only  the  upper  bound  is  usually  needed  in  most  cases. 
Furthermore,  when  the  number  of  cells  of  a  certain  type  are  small 
and  an  odd  number,  it  sometimes  makes  it  impossible  to  satisfy 
the  balance  constraint.  In  such  cases  the  balance  constraint  needs 
to  be  relaxed.  For  example,  if  there  are  only  3  cells  of  a  certain 
type  present,  then  balance  constraint  of  49%-51%  is  impossible  to 
satisfy  and  needs  to  be  relaxed  to  33%  -  67%  to  accomodate  them. 

4.  MULTI-RESOURCE  PARTITIONING  AL¬ 
GORITHMS  FOR  FPGAS 

To  solve  the  multi-resource  bisectioning  problem  we  developed 
two  classes  of  multi-resource  partitioning  algorithms.  The  first 
class,  computes  the  overall  solution  by  constructing  a  bisection  that 
simultaneously  balances  the  multiple  resources,  whereas  the  sec¬ 
ond  class,  achieves  the  desired  balance  by  modifying  a  bisection 
that  was  initially  obtained  using  a  traditional  single-constraint  bi¬ 
sectioning  algorithm.  We  will  refer  to  the  first  class  as  the  native 
multi-resource  partitioning  algorithms  and  to  the  second  class  as 
the  multi-resource  enforcement  algorithms.  The  details  of  the  var¬ 
ious  algorithms  in  each  of  these  classes  are  provided  in  the  rest  of 
this  section. 


4.1  Native  Multi-Resource  Partitioning  Algo¬ 
rithms 

We  developed  three  different  algorithms,  called  multi-phase,  multi¬ 
constraint ,  and  multi-phase-multi-constraint  that  are  capable  of  di¬ 
rectly  computing  a  partitioning  that  balances  the  different  resources. 
These  algorithms  were  motivated  by  recently  developed  graph  par¬ 
titioning  algorithms  for  partitioning  finite  element  meshes  arising 
in  multi-phase  and  multi-physics  scientific  numerical  simulations  [11, 
3],  Specifically,  our  multi-phase  algorithm  is  based  on  the  graph 
partitioning  algorithm  proposed  in  [3],  our  multi-constraint  algo¬ 
rithm  is  based  on  the  graph-partitioning  algorithm  proposed  in  [1 1], 
whereas  the  multi-phase-multi-constraint  algorithm  combines  ele¬ 
ments  from  both  of  these  approaches.  Details  on  these  algorithms 
are  provided  in  the  remainder  of  this  section. 

4.1.1  Multi-Phase  Bisection  (MP) 

The  basic  idea  of  this  algorithm  is  very  simple.  First  we  con¬ 
struct  a  series  of  hypergraphs  containing  cells  of  type  1  (H i),  cells 
of  type  1  and  2(H2),  cells  of  type  1,2  and  3  (#3),  and  so  on.  The 
hyperedges  for  these  sub  hypergraphs  are  reconstructed  based  on 
the  information  from  the  original  hypergraph.  After  that,  hMETS 
is  used  to  obtain  a  partition  of  H\ .  Now  using  the  partition  infor¬ 
mation  of  H 1,  we  can  easily  assign  partitions  for  cells  of  type  1 
in  H2 •  To  obtain  the  bisection  of  type  2  cells  of  Hi.  we  fix  the 
cells  of  type  1  (also  set  the  area  as  zero)  and  use  hMETS  as  usual 
which  generates  the  partition  information  for  cells  of  type  2.  Now 
partition  information  for  cells  of  type  1  and  cells  of  type  2  are  avail¬ 
able.  This  partitioning  also  satisfies  the  balance  constraints  for  both 
types  due  to  the  fact  the  balance  constraint  of  type  1  was  preserved 
since  they  were  fixed  vertices  and  the  balance  constraint  of  the  type 
2  cells  were  satisfied  hMETS.  (because  area  of  type  1  cells  were  set 
to  zero).  We  continue  this  process  by  influencing  the  partitioning 
of  Hi,  by  incorporating  partition  information  of  cell  types  1  and  2 
from  H2-  Next,  we  handle  W4  by  using  partition  information  from 
F/3  and  so  on. 

Since  it  is  easier  to  influence  the  bisection  of  smaller  subset  of 
cells  from  the  partition  information  of  larger  subset  of  cells,  we  re¬ 
order  the  types  such  that  the  number  of  cells  of  type  1  are  the  most, 
type  2  second  most  and  so  on. 

4.1.2  Multi-Constraint  Bisection  (MC) 

The  multi-resource  partitioning  problem  can  be  naturally  solved 
using  the  multi-constraint  partitioning  problem  initially  developed 
in  the  context  of  graphs.  Specifically,  using  the  general  framework 
introduced  in  [11],  we  extend  the  hypergraph  model  so  that  each 
vertex  v  has  a  weight  vector  w(v)  of  size  m  associated  with  it. 
The  /th  component  of  this  vector  Wj(v)  corresponds  to  the  weight 
associated  with  the  f th  constraint.  This  model  assumes,  without 
loss  of  generality,  that  the  weight  vectors  of  the  vertices  satisfy  the 
property  that  ^ZvreV'  wiC)  =  1-0  for  7  =  1,2,...,  m.  Using  a 
framework  analogous  to  that  used  for  single-constraint  problems, 
we  allow  for  m  lower-  and  upper-bound  constraints  on  the  size  of 
each  partition  (/,-,  «,•)  for  i  =  1,2,...,  m,  such  that  0  <  /;  < 
uj  and  lj  +  Uj  =  1.  Given  these  definitions,  the  multi-constraint 
hypergraph  bisection  problem  is  formally  defined  as  follows: 

Compute  a  bisection  P  of  V  that  minimizes  the  sum  of  the  weight 
of  the  hyperedges  that  span  multiple  partitions  subject  to  the  con¬ 
straint  that 

h  <  X  u)j(v)<ui, 

WveV:P[v]=j 

where  j  =  1,2  and  i  =  1.2,  ...  ,m  represent  the  different  vertex 
weights.  This  multi-constraint  partitioning  problem  tries  to  find  a 


bisection  such  that  each  weight  is  individually  balanced  within  the 
specified  lower-  and  upper-bound  tolerances. 

Using  this  multi-constraint  partitioning  problem  formulation  the 
multi-resource  partitioning  problem  can  be  formulated  as  follows. 
Given  a  multi-resource  hypergraph  G  =  (V,  E)  with  m  different 
vertex  types,  then  each  vertex  t  e  Vis  assigned  a  vector  of  m  ver¬ 
tex  weights  w(v),  such  that  w?(l)v[tj]  =  1  and  Vi  t(v)w,(v)  =  0. 

It  is  easy  to  see  that  a  feasible  multi-constraint  solution  of  this 
hypergraph  will  correspond  to  a  feasible  solution  for  the  multi¬ 
resource  partitioning  problem,  as  well. 

We  have  developed  a  multi-constraint  hypergraph  partitioning  al¬ 
gorithm  that  follows  the  traditional  structure  of  the  multilevel  par¬ 
titioning  paradigm.  Specifically,  we  developed  algorithms  for  the 
coarsening,  initial  partitioning,  and  uncoarsening  phases  that  com¬ 
bine  elements  of  the  single-constraint  hypergraph  partitioning  al¬ 
gorithms  in  hMETS  with  the  multi-constraint  extensions,  initially 
introduced  for  graph  partitioning  [11],  Due  to  space  constraints, 
in  this  paper  we  will  only  describe  the  multi-constraint  partitioning 
refinement  algorithm  used  during  the  uncoarsening  phase  as  it  is  an 
integral  part  in  many  of  the  approaches  presented  in  this  paper.  The 
interested  readers  should  refer  to  [1 1,  8,  5]  for  further  details. 

Multi-constraint  Refinement  (MC-FM).  We  developed  a 
multi-constraint  bisection  refinement  algorithm,  called  MC-FM,  which 
is  based  on  the  widely  used  single-constraint  FM  algorithm  [6]  and 
operates  as  follows.  For  each  one  of  the  two  partitions,  it  maintains 
m  priority  queues,  where  m  is  the  number  of  weights.  A  vertex 
belongs  to  only  a  single  priority  queue  depending  on  the  relative 
order  of  the  weights  in  its  weight  vector.  In  particular,  a  vertex 
v  with  weight  vector  (iiq(i)),  W2(v), . . . ,  wm(v)),  belongs  to  the 
j  th  queue  if  w j(v)  =  ma xfwfv)).  Given  these  2m  queues,  the 
algorithm  starts  by  initially  inserting  all  the  vertices  to  the  appropri¬ 
ate  queues  according  to  their  gains.  Then,  it  proceeds  by  selecting 
one  of  these  2m  queues,  picking  the  highest  gain  vertex  from  this 
queue,  and  moving  it  to  the  other  partition.  The  queue  is  selected 
as  follows.  If  the  current  bisection  represents  a  feasible  solution, 
then  the  queue  that  contains  the  highest  gain  vertex  among  the  2m 
vertices  at  the  top  of  the  priority  queues  is  selected.  On  the  other 
hand,  if  the  current  bisection  is  infeasible,  then  the  queue  is  selected 
depending  on  the  relative  weights  of  the  two  partitions.  Specifi¬ 
cally,  if  A  and  B  are  the  two  partitions,  then  the  algorithm  selects 
the  queue  corresponding  to  the  largest  Wj (x)  with  x  6  [A,  B }  and 
i  =  1,2,...,  m.  If  it  happens  that  the  selected  queue  is  empty, 
then  the  algorithm  selects  a  vertex  from  the  non-empty  queue  cor¬ 
responding  to  the  next  heaviest  weight  of  the  same  partition.  For 
example,  if  m  =  3,  (iuj(A),  W2 (A),  1113(A))  =  (.43,  .60,  .52),  and 
(w\(B),  u>2(B),  w3(B))  =  (.57,  .4.  .48),  the  algorithm  will  select 
the  second  queue  of  partition  A.  If  this  queue  is  empty,  it  will  then 
try  the  third  queue  of  A,  followed  by  the  first  queue  of  A.  Note 
that  we  give  preference  to  the  third  queue  of  A  as  opposed  to  the 
first  queue  of  B,  even  though  B  has  more  of  the  first  weight  than 
A  does  of  the  third.  This  is  because  our  goal  is  to  reduce  the  sec¬ 
ond  weight  of  A.  If  the  second  queue  of  A  is  non-empty,  we  will 
select  the  highest  gain  vertex  from  that  queue  and  move  it  to  B. 
However,  if  this  queue  is  empty,  we  still  will  like  to  decrease  the 
second  weight  of  A,  and  the  only  way  to  do  that  is  to  move  a  node 
from  A  to  B.  This  is  why  when  our  first-choice  queue  is  empty,  we 
then  select  the  most  promising  node  from  the  same  partition  that 
this  first-queue  belongs  to. 

4.1.3  Multi-Phase  Multi-Constraint  (MPMC) 

This  algorithm  incorporates  the  features  of  both  multi-phase  bi¬ 
section  and  multi-constraint  bisection.  The  general  structure  is  sim- 


ilar  to  that  of  Section  4.1.1,  but  when  constructing  the  sub  hyper¬ 
graphs  (  H[,  H2  Hm),  it  also  incorporates  pseudo  hyperedges  to 
retain  the  information  of  the  original  hypergraph  more  accurately 
and  also  to  prevent  these  sub  hypergraphs  front  becoming  sparser 
and  result  in  disconnected  segments.  This  problem  is  especially  se¬ 
vere  when  numerous  constraints  are  present  and  results  in  highly 
disconnected  H[ .  Bisection  of  this  trivial  hypergraph  H [  may  not 
correspond  well  with  min-cut  bisection  of  the  original  hypergraph. 

Adding  pseudo  hyperedges  is  done  in  the  following  way.  When 
a  vertex  is  removed,  its  neighbors  are  analyzed  to  determine  how 
closely  each  neighbor  is  connected  to  the  removed  vertex.  If  the 
connectivity  is  larger  than  10%  of  average  hyperedge  weight,  then 
these  neighbors  are  considered  to  be  connected  to  the  removed  ver¬ 
tex  and  are  connected  by  a  light  weight  pseudo  hyperedge.  The 
connectivity  to  neighbors  is  estimated  by  representing  each  hyper¬ 
edge  by  a  clique  of  edges  each  with  the  weight  of  w(e)/(\e\  —  1) 
and  by  summing  the  weights  of  edges  common  to  each  neighbor 
and  the  removed  vertex.  The  pseudo  hyperedges  introduced  do  not 
participate  in  estimating  connectivity.  These  settings  work  very 
well  for  our  purpose  as  evident  in  Section  5  but  may  require  fine 
tuning  depending  on  the  application. 

In  addition  to  the  above  process,  we  also  apply  MC-FM  for  each 
of  the  sub  hypergraphs  containing  more  than  one  type  (Hs.-Hm). 
This  allows  previously  fixed  cells  to  become  free  and  move,  which 
often  results  in  substantial  improvement. 

4.2  Multi-Resource  Enforcement  Algorithms 

In  analyzing  the  characteristics  of  the  various  multi-resource  cir¬ 
cuits  we  discovered  that  the  different  types  of  vertices  are  reason¬ 
ably  well-distributed  throughout  the  underlying  hypergraph.  This 
suggests  that  the  bisections  produced  by  single-constraint  partition¬ 
ing  algorithms,  even  though  they  will  not  be  perfectly  balanced, 
they  will  not  be  arbitrarily  unbalanced  either.  Moreover,  since 
these  partitionings  can  be  computed  using  state-of-the-art  multi¬ 
level  schemes,  they  will  have  small  cuts.  Motivated  by  this  ob¬ 
servation,  we  developed  two  schemes  that  take  as  input  a  min-cut 
single  constraint  partitioning  and  try  to  enforce  the  various  multi¬ 
resource  balanced  constraints. 

4.2.1  Single-Constraint  Direct-Balancing  (SCDB) 

In  this  method,  we  use  the  multilevel  single-constraint  partitioner 

hMEliS  to  seed  the  initial  bisection.  Then  we  use  an  explicit  bal¬ 
ancing  algorithm  to  balance  the  multiple  resources  in  a  single  step. 
This  multi-constraint  balancing  algorithm  operates  very  similar  to 
MC-FM  (described  in  Section  4.1.2),  except  that  it  gives  priority 
to  finding  a  balanced  bisection  rather  than  minimizing  cut.  This 
balancing  step  tends  to  increase  the  cut.  especially  when  the  num¬ 
ber  of  constraints  is  large.  Hence,  it  is  imperative  to  apply  multi¬ 
constraint  refinement  algorithms  after  obtaining  a  feasible  bisec¬ 
tion.  Therefore,  a  single  iteration  of  MC-FM  is  applied  in  an  effort 
to  improve  the  cut  quality  after  obtaining  a  feasible  bisection. 

4.2.2  Single-Constraint  Multi-Phase  Balancing  (SCMB) 

As  in  the  previous  algorithm  (Section  4.2.1),  we  use  hMEliS  to 

obtain  an  initial  solution  and  then  fix  all  the  cells  of  the  types  that 
satisfy  the  balancing  constraints.  For  the  unbalanced  types,  we  or¬ 
der  them  front  least  unbalanced  to  most  unbalanced,  and  then  bisect 
each  of  them  in  the  way  described  in  Section  4.1.1.  After  each  un¬ 
balanced  type  is  balanced  we  also  apply  an  iteration  of  MC-FM  to 
capitalize  on  the  perturbation  caused  during  balancing. 

4.3  Additional  Improvements 

After  the  bisection  of  the  original  hypergraph  has  been  com- 
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Table  2:  The  characteristics  of  netlists  used  for  evaluating  al¬ 
gorithms 

puted.  it  is  possible  to  further  improve  the  cut  by  applying  a  multi¬ 
constraint  V-cycle.  Multi-Constraint  V-cycle  consists  of  two  com¬ 
ponents.  restricted  multi-constraint  coarsening  and  multi-constraint 
refinement.  The  restricted  multi-constraint  coarsening  step  differs 
from  regular  multi-constraint  coarsening  by  the  presence  of  an  ad¬ 
ditional  requirement  that  any  two  vertices  that  are  collapsed  to¬ 
gether  belong  to  the  same  partition.  The  information  regarding 
the  partitioning  is  thus  preserved  during  the  creation  of  succes¬ 
sive  approximate  hypergraphs.  This  coarsening  scheme  is  a  multi¬ 
constraint  version  of  restricted  coarsening  presented  in  [9].  The 
second  component  is  same  as  the  multi-constraint  refinement  pre¬ 
sented  in  Section  4.1.2. 

5.  EXPERIMENTS 

We  experimentally  evaluated  our  multi-resource  aware  partition¬ 
ing  algorithms  on  an  industrial  benchmark  suite  consisting  of  twelve 
large  designs  synthesized  for  Virtex  II  architecture  [15].  The  types 
of  cells  consist  of  sub  CLB  elements  such  as  LUTs,  FFs,  MUXes, 
control  gates  and  non  CLB  elements  such  as  RAM  Blocks,  DCMs, 
IOBs  etc.  The  details  of  these  benchmarks  are  listed  in  Table  2.  The 
column  labeled  as  “#  types”  shows  the  number  of  distinct  types  of 
cells  available  on  that  particular  benchmark.  The  columns  labeled 
as  “min”  shows  minimum  number  of  cells  of  any  type  for  that 
benchmark,  and  similarly  the  “max”  and  “avg”  columns  provide 
the  details  of  distribution  of  number  of  cells  in  each  hypergraph. 

To  evaluate  the  quality  of  the  solutions  obtained  by  the  various 
multi-resource  partitioning  algorithms,  we  used  hMEliS  (version 
1.5.3  [10])  to  obtain  single-constraint  bisections  of  the  different 
hypergraphs.  These  solutions  were  obtained  using  hMETiS's  de¬ 
fault  parameters  (including  V-cycle  at  the  end).  Furthermore,  to 
make  such  quality  comparisons  easier,  we  computed  the  Average 
Ratio  of  Quality  (ARQ)  of  each  algorithm  against  that  obtained  by 
hMEliS.  To  ensure  the  meaningful  averaging  of  these  ratios,  we  first 
took  the  log2-values  of  these  ratios,  then  calculated  their  mean  //, 
and  then  used  2^  as  their  average.  This  method  ensures  that  ratios 
corresponding  to  comparable  degradations  or  improvements  (i.e., 
ratios  that  are  less  than  or  greater  than  one)  are  given  equal  impor¬ 
tance.  The  ARQ  number  larger  than  1.0  indicates  degradation  in 
quality. 

To  ensure  the  statistical  significance  of  our  experimental  results, 
for  both  hMEliS  and  each  one  of  the  five  multi-resource  partitioning 
algorithms  we  report  average  min-cut  of  ten  runs. 

5.1  Comparison  of  Native  Algorithms 

Tables  3  and  4  show  the  results  obtained  by  the  various  native 
multi-resource  partitioning  algorithms  (described  in  Section  4.1) 
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Performance  of  algorithms  as  an  average  of  10  runs 

for  49% -51%  balance  constraint. 
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Table  4:  Performance  of  algorithms  as  an  average  of  10  runs 
for  45%  -55%  balance  constraint. 

for  49%-51%  and  45%-55%  balance,  respectively.  Each  of  these 
tables  shows  the  average  minimum  cuts  obtained  by  the  MP,  MC, 
and  MPMC  multi-resource  partitioning  algorithms  under  two  dif¬ 
ferent  scenarios.  In  the  first  scenario,  the  solution  obtained  by  these 
algorithms  was  kept  as  it  was,  whereas  in  the  second  scenario,  the 
solution  was  further  refined  by  performing  a  P -cycle  refinement 
step  (as  discussed  in  Section  4.3). 

The  columns  labeled  “hMEliS”  show  the  average  min-cut  ob¬ 
tained  by  hMEliS  for  either  49%-51%  or  45%-55%  balance.  Note 
that  hMEliS’sbisections  will  not  necessarily  solve  the  multi-resource 
problem,  as  they  do  not  account  for  the  different  vertex  types. 

Finally,  the  rows  labeled  “ARQ”  provides  the  average  ratio  of 
quality  of  each  algorithm  to  hMEliS’s  results  (computed  using  the 
scheme  described  in  the  previous  section),  and  the  rows  labeled 
“Time”  shows  the  amount  of  time  required  by  the  multi-resource 
partitioning  algorithms  relative  to  that  required  by  hMEliS.  Num¬ 
bers  less  than  one  represent  runtimes  that  are  smaller  than  that  of 
hMEliS,  whereas  numbers  greater  than  one  represent  higher  run¬ 
times. 

Comparing  the  results  in  these  tables  we  can  see  that  all  schemes 
produce  solutions  whose  cuts  are  worse  than  those  produced  by 
hMEliS.  This  should  not  be  surprising,  as  hMEliS  solves  the  single¬ 
constraint  bisectioning  problem  which,  in  general,  does  not  solve 
the  multi-resource  partitioning  problem. 

Comparing  the  solutions  produced  by  the  various  multi-resource 
partitioning  algorithms  we  can  see  that  there  is  a  considerable  amount 
of  variability  on  the  quality  of  the  final  solutions.  In  particular,  in 
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Table  5:  Performance  of  algorithms  combined  with  multi¬ 
constraint  V-cycle  as  an  average  10  runs  for  49% -51%  balance 
factor. 

the  absence  of  V  -cycle  refinement,  the  quality  of  the  solutions  pro¬ 
duced  by  MP  are  significantly  worse  than  those  produced  by  ei¬ 
ther  MC  or  MPMC.  On  the  average,  the  49%-51%  cuts  produced 
by  MP  are  4.4  times  worse  than  those  produced  by  the  single¬ 
constraint  hMEliS,  whereas  the  cuts  produced  by  MC  and  MPMC 
are  only  55.4%  and  50%  worse  than  hMEliS's  cuts,  respectively. 
Similar  trends  can  be  also  observed  for  the  45%-55%  cuts,  as  well. 
These  results  illustrate  that  the  multi-constraint  algorithm  (MC) 
and  the  modifications  to  the  multi-phase  partitioning  algorithm  im¬ 
plemented  in  the  MPMC  algorithm,  lead  to  superior  solutions. 

Comparing  the  results  without  and  with  V  -cycle  refinement  we 
see  that  the  overall  quality  of  all  three  algorithms  improves  by  using 
V -cycle  refinement.  However,  the  overall  rate  of  improvement  is 
different  for  different  schemes.  The  MP  algorithm  gains  the  most, 
whereas  the  MPMC  algorithm  gains  the  least.  We  believe  that  the 
reason  for  that  is  the  fact  that  the  solutions  of  MC  and  MPMC  are 
already  of  reasonable  high  quality,  and  thus,  there  is  relatively  lit¬ 
tle  room  for  improvement.  However,  because  MP’s  initial  solu¬ 
tion  is  considerably  worse,  by  applying  a  P -cycle  refinement,  we 
can  achieve  dramatic  quality  improvements.  As  a  result,  the  49%- 
51%  solution  for  MP  now  becomes  only  88.2%  worse  than  that  of 
hMEliS. 

Finally,  comparing  MC  with  MPMC  we  can  see  that  the  lat¬ 
ter  leads  to  consistently  better  solutions,  which  are  on  the  average 
5% — 10%  better  than  those  obtained  by  MC.  However,  this  quality 
advantage  comes  at  the  expense  of  higher  computational  require¬ 
ments.  In  general,  MPMC  requires  2.5  to  5.0  times  more  time  than 
that  required  by  MC.  Note  that  the  reason  that  the  runtimes  of  MP 
and  MC  without  P -cycle  are  in  general  smaller  than  that  of  hMEliS 
is  because  hMEliS  does  perform  a  P -cycle  refinement  at  the  end. 

5.2  Comparison  of  Enforcement  Algorithms 

Tables  5  and  6  show  the  results  obtained  by  the  various  enforcement- 
based  multi-resource  partitioning  algorithms  (described  in  Section  4.2) 
for  49%-51%  and  45%-55%  balance,  respectively.  Each  of  these 
tables  shows  the  average  minimum  cuts  obtained  by  the  SCDB 
and  SCMB  partitioning  algorithms  without  and  with  P-cycle  re¬ 
finement.  In  addition,  the  columns  labeled  “hMEliS”  show  the  re¬ 
sults  obtained  by  hMEliS  (which  are  identical  to  those  shown  in  Ta¬ 
bles  3  and  4).  the  rows  labeled  “ARQ”  provides  the  average  ratio  of 
quality  of  each  algorithm  to  hMEliS’s  results,  and  the  rows  labeled 
“Time”  shows  the  amount  of  time  required  by  the  multi-resource 
partitioning  algorithms  relative  to  that  required  by  hMEliS. 

Comparing  the  solutions  produced  by  the  two  sets  of  enforcement- 
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Table  6:  Performance  of  algorithms  combined  with  multi¬ 
constraint  V-cycle  as  an  average  10  runs  for  45% -55%  balance 
factor. 

based  multi-resource  partitioning  algorithms  we  can  see  that,  unlike 
the  native  algorithms,  there  is  relatively  little  variation  between  the 
performance  achieved  by  them.  Specifically,  the  performance  dif¬ 
ference  between  the  two  schemes  is  less  that  7%,  on  the  average. 
However,  the  SCMB  algorithm  is  consistently  better  than  SCDB, 
leading  to  better  solutions  in  3 1  out  of  the  48  different  experimen¬ 
tal  data-points.  Comparing  the  results  without  and  with  V -cycle 
refinement  we  see  that  as  it  was  the  case  with  the  native  algorithms, 
the  overall  quality  of  the  two  algorithms  improves,  as  well.  How¬ 
ever,  those  improvements  are  relatively  small,  ranging  on  the  aver¬ 
age  between  2%  and  5%.  Finally,  comparing  the  amount  of  time 
required  by  these  algorithms  we  can  see  that  SCMB  is  slower  than 
SCDB,  but  in  most  cases  the  difference  is  small. 

5.3  Overall  Comparisons 

Comparing  the  performance  achieved  by  the  various  multi-resource 
partitioning  algorithms  we  can  see  that  in  almost  all  the  cases,  the 
enforcement-based  algorithms  lead  to  solutions  that  have  lower  cut 
than  those  obtained  by  the  native  multi-resource  partitioning  al¬ 
gorithms.  For  example,  the  best-performing  enforcement-based 
scheme  SCMB  outperforms  the  best-performing  native  scheme  in 
41  out  48  data-points.  Moreover,  the  cut  differences  are  consider¬ 
able,  and  on  the  average  SCMB  leads  to  cuts  that  are  13% — 32% 
better  than  that  of  MPMC.  However,  this  performance  advantage  is 
also  data-set  dependent,  and  the  relative  performance  of  the  various 
schemes  can  change  for  different  benchmarks. 

Finally,  comparing  the  performance  achieved  by  SCMB  against 
that  achieved  by  the  single-constraint  hMEiiS,  we  can  see  that  the 
overall  increase  in  the  cut  resulting  by  solving  the  multi-resource 
partitioning  problem,  is  quite  small.  For  example,  if  we  consider 
SCMB’s  results  with  V-cycle  refinement  we  can  see  that  on  the 
average  the  cut  increase  by  only  5.7%  and  3.3%  for  the  49%-51% 
and  45%-55%  balance  constraints,  respectively. 

6.  CONCLUSION 

In  this  paper  we  presented  two  classes  of  multi-resource  aware 
partitioning  algorithms  for  enabling  partitioning-based  placement 
methods  for  FPGA  architectures  with  heterogeneous  devices.  These 
algorithms  are  very  effective  in  minimizing  the  cut  while  satisfying 
multiple  balancing  requirements  with  acceptable  computational  ef¬ 
fort.  The  average  cut  of  the  most  effective  algorithm  is  only  5.7% 
and  3.3%  worse  than  that  of  the  state-of-the-art  partitioning  tool 
hMEiiS  [10]  for  49%-51%  and  45%-55%  balance  constraints,  re¬ 
spectively.  Moreover,  their  additional  computational  requirements 


are  small,  requiring  only  two  to  three  times  more  time  than  hMEiiS. 

These  results  indicate  that  high-quality  partitionings  are  feasible 
for  designs  with  multiple  resource  requirements,  suggesting  that 
partitioning-based  placement  methods  can  be  used  for  placing  such 
designs  on  modern  FPGA  architectures. 
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